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Nonlinear Modeling of Time Series 

using 

Multivariate Adaptive Regression Splines (MARS) 

P.A.W. Lewis J.G. Stevens 

Naval Postgraduate School, Monterey CA 



SUMMARY 

MARS is a new methodology, due to Friedman, for nonlinear regression modeling. MARS can be concep- 
tualized as a generalization of recursive partitioning that uses spline fitting in lieu of other simple functions. 
Given a set of predictor variables, MARS fits a model in the form of an expansion in product spline basis 
functions of predictors chosen during a forward and backward recursive partitioning strategy. MARS pro- 
duces continuous models for high dimensional data that can have multiple partitions and predictor variable 
interactions. Predictor variable contributions and interactions in a MARS model may be analyzed using an 
ANOVA style decomposition. 

By letting the predictor variables in MARS be lagged values of a time series, one obtains a new method for 
nonlinear autoregressive threshold modeling of time series. A significant feature of this extension of MARS is 
its ability to produce models with limit cycles when modeling time series data that exhibit periodic behavior. 
In a physical context, limit cycles represent a stationary state of sustained oscillations, a satisfying behavior 
for any model of a time series with periodic behavior. Analysis of the Wolf sunspot numbers with MARS 
appears to give an improvement over existing nonlinear Threshold and Bilinear models. 



Keywords: MULTIVARIATE ADAPTIVE REGRESSION SPLINES; NONLINEAR TIME SERIES MODELS; 
RECURSIVE PARTITIONING; REGRESSION SPLINES; THRESHOLD MODELS; ASTAR 
MODELS; WOLF SUNSPOT NUMBERS; LIMIT CYCLES 

1 INTRODUCTION 

Regression modeling is a frequently applied statistical technique that serves as a basis for studying and 
characterizing a system of interest. We use regression modeling to formulate a reasonable mathematical model 
of the relationship between the predictor and response variables of the system. The choice of a modeling form 
may be based on previous knowledge of the system or on considerations such as smoothness and continuity of 
the response and predictor variables. 

Let y represent a single response variable that depends on a vector of p predictor variables x where 
x = (xi, . . . } x Vi . . . ,£p). Assume we are given N samples of y and x } namely { y g , ic » } j and that y is de- 
scribed by the regression model, 

y = /(*!,... ,* p ) + f (1) 
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over some domain D C 9i p , which contains the data. The function f(x) reflects the true but unknown relationship 
between y and x. The random additive error variable c, which is assumed to have mean zero and variance 
reflects the dependence of y on quantities other than x. The goal is to formulate a function /( x) that is a 
reasonable approximation of f(x) over the domain D. If the correct parametric form of f(x) is known, then we 
can use parametric regression modeling to estimate a finite number of unknown coefficients. However, in this 
paper the approach is nonparametric regression modeling (Eubank, 1988). We only assume that f(x) belongs 
to a general collection of functions and rely on the data to determine the final model form and its associated 
coefficients. 

In the first part of this paper we explain Multivariate Adaptive Regression Splines (MARS) (Friedman, 1988), 
a new method of flexible nonparametric regression modeling that appears to be an improvement over existing 
methodology when using moderate sample sizes N with dimension p > 2. Next, we introduce the use of MARS 
for modeling in a univariate time series context, x r for r — 1,2, ... A r , i.e., the predictor variables are the lagged 
values of the response variable x T . The result is a multivariate adaptive autoregressive spline model for the time 
series. Note that the discussion of MARS in this paper is a simple introduction that is only complete enough to 
motivate the extension to time series modeling with MARS. For further details on MARS see Friedman (1988). 

In the regression context, MARS can be conceptualized as a generalization of a recursive partitioning strategy 
(Morgan and Sonquist, 1963; Breiman et al., 1984) which uses spline fitting in lieu of other simple functions. 
Given a set of predictor variables, MARS fits a model in the form of an expansion in product spline basis 
functions of predictors chosen during a forward and backward recursive partitioning strategy. Although MARS 
is a computationally intensive regression methodology, it can produce continuous models for high dimensional 
data that can have multiple partitions and predictor variable interactions. Predictor variable contributions and 
interactions in a MARS model may be analyzed using an ANOVA style decomposition. 

Although MARS is capable of regression modeling in low dimensional environments p < 2, its primary advan- 
tages exist in higher dimensions. A difficulty with applying existing multivariate regression modeling methodolo- 
gies to problems of dimension greater than two has been called the curse-of- dimensionality (Bellman, 1961). The 
curse- of- dimensionality describes the need for an exponential increase in sample size N for a linear increase in p, 
in order to densely populate higher dimensional spaces. MARS attempts to overcome the curse- of- dimensionality 
by exploiting the localized low-dimensional structure of the data used in constructing f(x). 

With MARS, by letting the predictor variables be lagged values of a time series, one obtains a new method 
for nonlinear threshold modeling of time series we call ASTAR (Adaptive Spline Threshold Autoregression). 
We illustrate this methodology by applying ASTAR to simple autoregressive and nonlinear threshold models. 
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A significant feature of ASTAR is its ability to produce models with limit cycles when modeling time series 
data that exhibit periodic behavior. In a physical context, limit cycles represent a stationary state of sustained 
oscillations, a satisfying behavior for any model of a time series with periodic behavior. Our analysis of the Wolf 
sunspot numbers with ASTAR appears to improve existing nonlinear Threshold and Bilinear models. 

In this paper the approach taken to explain MARS is geometric in nature; we focus on the iterative formation 
of overlapping subregions in the domain D of the predictor variables. Each subregion of the domain is associated 
with a product spline basis function. MARS approximates the unknown function /(x) using the set of product 
spline basis functions associated with the overlapping subregions of the domain. To motivate the development of 
the MARS procedure, the next two sections briefly review recursive partitioning and regression splines. Section 4 
is a discussion of Friedman’s innovations used to develop MARS. An algorithm for implementing MARS is 
addressed in section 5. The application of MARS to the modeling of time series is discussed in Section 6. 

2 RECURSIVE PARTITIONING (RP) 

The origin of recursive partitioning regression methodology appears to date to the development and use of 
the AID (Automatic Interaction Detection) program by Morgan and Sonquist in the early 1960’s. More recent 
extensions and contributions were made by Breiman et al. (1984). We explain recursive partitioning using 
recursive splitting of established subregions which is recast as an expansion in a set of basis functions. The latter 
explanation of recursive partitioning may be considered a precursor to MARS. 

2.1 RP: Recursive Splitting of Established Subregions 

Let the response variable y depend in some unknown way on a vector of p predictor variables x = (xj , . . . , x p ), 

that we model with (1). Assume we have N samples of y and x, namely {y^x,*}^. Let {Rj}f :=l be a set of S 

s 

disjoint subregions of D such that D — |^J Rj. Given the subregions recursive partitioning estimates 

the unknown function /(x) at x with 

/(*) = {/>(*) ( 2 ) 

where the function fj{x) estimates the true but unknown function /(x) over the Rj th subregion of D. In 
recursive partitioning, fj(x) is usually taken to be a constant (Morgan and Sonquist, 1963 and Breiman et 
al., 1984) although linear functions have been proposed without much success (Breiman and Meisel, 1976). For 
the purpose of explaining MARS fj(x) is a constant function, 

fj(x) = Cj V xeRj, (3) 
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whore each Cj is chosen to minimize the jth component of the residual-squared-error (badness-of-fit), 

BOF[fj(x)] = min ^(yi-cj) 2 . (4) 

Since the subregions of the domain D are disjoint, each cj will be the sample mean of the y,’s whose {*i}£Li € Rj . 

In general, the recursive partitioning model is the result of a 2-step procedure that starts with the single 
subregion R\ = D. The first, or forward, step uses recursive splitting of established subregions to iteratively 
produce a large number of disjoint subregions {Rj}jL 2 , for M > S, where M is chosen by the user. The second, 
or backward, step reverses the first step and trims the excess (M — S) subregions using a criterion that evaluates 
both the model fit and the number of subregions in the model. The goal of the 2-step procedure is to use the 
data to select a good set of subregions {Rj}j =l together with the constant functions Cj that estimate f(x) over 
each subregion of the domain. 

To facilitate understanding of the recursive partitioning algorithm we examine the forward step procedure 
for an example problem using p = 3 predictor variables, and M = 5, the maximum number of forward step 
subregions. Let v = 1, . . . ,p index the predictor variables and k = 1, . . . , n index the ordered sample values of a 
predictor variable x v in subregion Rj. For our purposes we use BOF m = l 5 OF [//(as)] as the forward step 
measure of fit for a recursive partitioning model with m subregions and restrict the set of candidate partition 
points to the actual sample values, x v> k. Note that x Vt k represents the kth ordered sample value of the nth 
predictor variable while x v alone denotes the running values of the rth predictor variable. At the start of the 
forward step recursive partitioning algorithm, R\ is the entire domain D and the single subregion estimate for 
/(*) is 

f(x) = fx(x) = c, = 

i = l 

The forward step measure of fit for the single subregion recursive partitioning model is 

N 

BOFi = ^(j/i — Ci) 2 . (6) 

» = 1 

The initial recursion, m = 2, for the forward step algorithm selects a partition point t* that best splits 
subregion Ri into two disjoint sibling subregions. The method for discovering t* is straightforward exhaustive 
search; evaluate every sample value x Vt k (for v — = l,...,n) as a candidate partition point to 

determine which one minimizes the remaining badness-of-fit for a m — 2 subregion model. For example, let 
t — identify a candidate partition point for predictor variable X\. The area in parent subregion R\ to the 
left of t y X\ < t , resides in proposed sibling subregion R\j. The area to the right of t, t <X\, resides in proposed 
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sibling subregion R\ r . Given the proposed split of R\ along t = we evaluate the model using BOF rn for 

a m = 2 subregion model, i.e., 

BOF 2 = min ^ (y, - c;) 2 + min ^ (y,-c r ) 2 . (7) 

e ' *>e« 1,1 <r 

Using the indices t> and the exhaustive search sequentially evaluates all possible partition points for each 
predictor variable in R u which here is equal to D. 

For our example problem, let the partition point t* = £ 2,25 identify the split of subregion R\ that minimizes 
the forward step fit criterion BOF m for a m = 2 subregion recursive partitioning model. We use £ 2,25 t° create 
two new disjoint subregions during the split and elimination of the old parent region Ri* . First, the area in 
parent subregion Ri + to the left of t* i.e., £2 < t* is assigned to sibling subregion R 2 while the area to the right 
of t* i.e., t * < £2 is reconstituted as subregion R\. The creation of the two new disjoint subregions R\ and 
R 2 and the elimination of the old parent subregion Ri* increase by one the number of disjoint subregions that 
partition D and finish the initial recursion of the forward step procedure. Thus, the two subregion recursive 
partitioning estimate of f(x) for our example problem is 

/(*) = { Cj :x e Rj for i = 1,2}, (8) 

where, since we are splitting the domain D on only one dimension, namely £ 2 , 

p f ft tf *2 > *2,25 

^ \ R 2 if *2 < * 2 , 25 . 

Note that the form of the recursive partitioning model (2) did not change during the recursion, only the number 
of disjoint subregions that partition D. 

The recursions m = 3, . . . , M — 5 of the forward step algorithm, are a repeat of the first recursion with 

one exception. The exhaustive search is now conducted to identify the best split for one and only one of the 

subregions from the current m— 1 subregion model. Each recursion’s partition point t* is selected as before, after 
an evaluation of all potential partition points for each predictor variable in the existing subregions {Rj of 
the model. The recursive splitting continues until the domain D is partitioned into M = 5 disjoint subregions 
{ Rj } | _ j . Upon completion of the forward step recursive partitioning algorithm, a backward step algorithm 
trims excess subregions using a criterion that evaluates both fit and the number of subregions in the model. See 
(Friedman, 1988) for a discussion of the backward step algorithm. Completion of the backward step procedure 
results in the final recursive partitioning model with {Rj}j = j subregions. 
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2.2 RP: An Expansion in a Set of Basis Functions 



While the intuitive approach to understanding recursive partitioning is through recursive splitting, it is recast 
now in a form that provides a reference for explaining the MARS methodology. The central idea is to formulate 
the recursive partitioning model as an additive model of functions from disjoint subregions. Also, we associate the 
operation of subregion splitting with the operation of step function multiplying. The new approach approximates 
the unknown function f(x) at x with an expansion in a set of basis functions from disjoint subregions { Rj}j=i , 

5 

/(*) = ( 9 ) 

;=i 

where 

Bj{x ) = i[x e Rj ], 

and 7[x] is an indicator function with value 1 if its argument is true and 0 otherwise. The constant function 
Cj estimates the true but unknown function f(x) over the Rj th subregion of D and Bj(x) is a basis function 
that indicates membership in the Rj th subregion of D. We call Bj(x) a basis function because it restricts 
contributions for f(x) to those values of x in the Rj th subregion of D. The approximation of the unknown 
function f(x) at x in (2) and (9) are equivalent; the subregions {77j}J =1 are the same disjoint subregions of 
the domain D and the constant functions {cj}j =l are the same constant functions that estimate f(x) over each 
subregion. 

During each search for a partition of a subregion Rj using an expansion in a set of basis functions (9), the 
selection of a candidate partition point creates a particular functional form for f(x) that we call g in the following 
algorithm. Let H[rj\ be a step function that returns a value of 1 if 77 is positive and 0 otherwise. Following Fried- 
man (1988), an algorithm to implement the forward step recursive partitioning procedure using an expansion in 
a set of basis functions is: 
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Recursive Partitioning Algorithm (Forward Step) 



(10) 



Ri = D, £,(*) = 1 (a) 

For each subregion R m , m = 2 to M do: (b) 

boF = oo, j* =0, v* = 0, F = 0 (c) 

For each established subregion Rj , j = 1 to m — 1 do: (d) 

For each predictor variable x v in Rj, v = 1 to p do: (e) 

For each data value * in Rj , 2 = x v jk =1 to x Vi jt =n do: (f) 

9 = (E dti c d B d(*)) + c m Bj(x)H[t - x v ) + CjBj(x)H[x v - <] (g) 

bof = BOF m (h) 

if bof < boF then boF = bof ; j * = j; v* = v\ F = t end if (i) 

end for 
end for 
end for 

R m ~-R j .H[t'-x v .] G) 

Rj* <- Rj*H[x v * — F] (k) 

end for 
end 



The forward step recursive partitioning algorithm is initialized with the first subregion R } equal to the entire 
domain D (10a). The outer loop (10b) controls the iterative creation of the subregions {R m }m= 2* Next, the 
dummy variables (10c) for the evaluation of the fit procedure boF , region j* , predictor variable v* , and partition 
point F are initialized in preparation for identifying the next partition of an established subregion 
The three nested inner loops (lOd-lOf) perform the exhaustive search for the next partition point by iteratively 
searching across all established subregions (lOd), all predictor variables (lOe), and all values of the predictor 
variables in the jth subregion (lOf). Given the investigation of a partition point t for a predictor variable x v 
in subregion i? ; -, the function g (lOg), with parameter vector c = (ci,...,c m ), is the current candidate for a 
recursive partitioning model estimate of f(x) in the mth iteration of the forward step procedure. The first term 
in (lOg) includes all subregions except subregion Rj. The last two terms in (10g), 

c m Bj(x)H[t - x v ] + CjBj(x)H[x v - /], 

reflect the proposal to divide the parent subregion Rj into two disjoint sibling subregions using the step functions 
H[t — x v ] and H[x v — /] to identify each x's location with respect to the partition point t. Next, BOF m (10h) is 
the forward step measure of fit that evaluates the function g with respect to the data. Information for the best 
yet discovered partition, predictor variable, and subregion (lOi) is retained as the search continues for the best 
partition of an established subregion {Rj}™S\ l in the mth iteration. Completion of the. mth iteration’s search 
results in the division (and elimination) of the old parent subregion R into two disjoint sibling subregions 
(lOj and 10k) based on x v Fs location with respect to the partition point F. The iterations continue until the 
domain D is partitioned into M disjoint subregions {Rj}jL\- 
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Each basis function Bj(x) identifies membership in the Rjth subregion of D and is the result of the product 
of step functions whose partition points define the subregion Rj. For example, let R& be a subregion from the 
sequence of step functions H[x\ — fj], H[t 2 — £ 2 ], H[x 2 — < 3 ] and H[t\ — £ 1 ] where {< }-Lj is 0 , 1 , 0,1 respectively. 
Then the basis function B$(x) is, 

B 5 (x) = H\x\ - 0] x H[ 1 - £ 2 ] x H[x 2 - 0] x H[ 1 - z x ] t (11) 

which defines the subregion R 5 as a unit square in $ft 2 . The basis function B^(x) = 1 if 0 < £j < 1 and 
0 < £ 2 < 1 and is 0 otherwise. 

In recursive partitioning the subregions {Rj}j =l are disjoint. Each data point x is only a member of one 
subregion Rj . Therefore, the estimate of f(x) over subregion Rj is restricted to the functional form for fj(x). 
However, as we will address in section 4, MARS has overlapping subregions. The estimate of f(x) over subregion 
Rj may be obtained as a sum of multiple functional forms. 

Recursive partitioning is a very powerful methodology that is rapidly computed, especially if fj(x) is the 
constant Cj. Each forward step of the algorithm (10) partitions one and only one subregion of the domain on an 
influential variable x v *. This procedure increasingly localizes the activity of the predictor variables with respect 
to the response variable y. However, in general, there are several drawbacks to using recursive partitioning as a 
regression modeling technique. 

• Recursive partitioning models have disjoint subregions and are usually discontinuous at subregion bound- 
aries. This is disconcerting if we believe f(x) is continuous. 

• Recursive partitioning has an innate inability to adequately estimate linear or additive functions. This is 
due to the recursive division of established subregions during the forward step procedure that automatically 
produces predictor variable interactions unless all successive partitions occur on the same predictor variable. 

• The form of the recursive partitioning model (9), an additive combination of functions of predictor variables 
in disjoint regions, makes estimation of the true form of the unknown function f(x) difficult for large p. 

3 REGRESSION SPLINES 

The development of a regression spline model offers another method for explaining MARS. Silverman (1985) 
views spline functions as an attractive approach to modeling that may be thought of as a span between parametric 
and nonparametric regression methodology. For simplicity define a gth order polynomial function of x E D C 3 ? 1 
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with coefficients c/ as 



<? 

pq{x) = Y2 ClX ‘ for 1 € £>. ( 12 ) 

1=0 

Polynomials such as (12) are smooth and easy to manipulate. However, fitting data with a polynomial model 
may require higher order terms that may have unacceptable fluctuations. This leads us to divide the domain D 
into smaller subregions Rj to permit the use of polynomial functions of relatively low order. 

Let [a, b] = D C 5ft 1 and As = , *s-i} denote an ordered partition of [a, 6] into S disjoint subregions 

i.e., a = to < t\ < ... < ts-\ < ts = b. Denote the S disjoint subregions as Rj = for j = 1 ,...,5. 

Let C q [D\ represent the set of all continuous functions in D whose q — 1 derivatives are also continuous. Using j 
to index the subregions we define a spline function as a set of S piecewise qth order polynomial functions whose 

function values and first q — 1 derivatives agree at their partition points i.e., 

5 

S A S ( Z ) = XX>( Z ) 7 ( Z G R i\ ( 13 ) 

i = i 

with the restriction that s q As (x) E C q [D \ . 

There are several approaches for implementing splines within a regression setting (Wegman and Wright, 1983). 
One approach is the piecewise regression spline model, 



y = *L( X ) + f > 



(14) 



where again c is assumed to have mean zero and variance of, and s^ s (z) from (13) estimates f(x). 

Given a set of partitions points A 5, Smith (1979) has shown that a different and more useful regression spline 
model may be written using plus (H-) functions. The plus function is defined as 



( u if u > 0 
u+ " \ 0 if u < 0. 



(15) 



Again, let [a, 6] = D C K 1 . However, we now let A so = {<i>- .. ,< 5-1} define an ordered partition of [a, 6] into 
S overlapping subregions and denote the S overlapping subregions as Rj = for j = 1,...,S. Let / 

index the order of the polynomial terms in each subregion of the domain and Cji denote the coefficients for the 
Ith term of the polynomial function in the ( j -f l)st subregion of a spline model. Using plus functions results in 
a truncated regression spline model functionally equivalent to the piecewise regression spline model (13), 

q 5-1 

y= ^c 0( x' + ^c i ,[(z-< i ) + ] ? + c, (16) 

1=0 j- 1 

where e is assumed to have mean zero and variance o\. Since the partitions points of the set A so are ordered, 
the number of overlapping truncated spline functions with nonzero values increases by 1 as we move to the right 
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Quadratic Regression Spline Functions 

P1«c«wls« Spline with Partition Point (X » 1) 




Truncated Spline with Partition Point (X - 1) 




Figure 1: The different forms for a piecewise (13) and truncated (16) spline function using q = 2 order splines over the 
region D = [0,2] with a single partition point, at X = 1. 



and cross each partition point tj. Figure 1 compares the different forms for a q — 2 order piecewise (13) and 
truncated (16) spline function. 

The key point of this section is that once the number and the values of the partition points are 

fixed, the qth order truncated regression spline model (16) with those partition points is a linear model whose 
coefficients c may be determined by straightforward least squares regression. However, the major difficulty in 
implementing a qth order regression spline model is choosing the number and values of the partition points. 

We have defined regression spline models in 5ft 1 . The extension to higher dimensions for p > 1 predictor 
variables is usually accomplished u§ing products of univariate spline functions. However, products of univariate 
spline functions suffer from the curse- of- dimensionality discussed previously. From the perspective of regres- 
sion splines, MARS attempts to overcome the curse-of- dimensionality by using a modified recursive partitioning 
strategy to select partitions of the domain. This permits MARS to exploit the localized, low-dimensional struc- 
ture of the data using q = 1 truncated, multidimensional regression spline functions. 



4 FRIEDMAN’S INNOVATIONS FOR RECURSIVE 

PARTITIONING 



Recursive partitioning and regression splines have tremendous power for modeling in high dimensional en- 
vironments. Each approach also presents difficulties when applied; recursive partitioning has discontinuities, 
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variable interactions and poor model interpretation, and regression splines battle the curse-of- dimensionality 
and lack a methodology to optimally select its many parameters. 

Two aspects of the recursive partitioning algorithm (10) contribute to the difficulties of its application in a 
high dimensional setting. The iterative division and elimination of the parent region when creating its sibling 
subregions causes difficulty in estimating linear and additive functions. The discontinuous nature of the step 
function H[rj\ when applied in each linear regression of the forward step recursive partitioning algorithm (10g) 
causes the lack of continuity. Together, these characteristics make interpretation of the recursive partitioning 
model difficult at best. 

To overcome recursive partitioning’s difficulty in estimating linear and additive functions, Friedman proposes 
that the parent region is not eliminated (as in recursive partitioning) during the creation of its sibling subregions. 
Thus, in future iterations both the parent and its sibling subregions are eligible for further partitioning. An 
immediate result of retaining parent regions is overlapping subregions of the domain. Also, each parent region 
may have multiple sets of sibling subregions. With this modification, recursive partitioning can produce linear 
models with the repetitive partitioning of the initial region R\ by different predictor variables. Additive models 
with functions of more than one predictor variable can result from successive partitioning using different predictor 
variables. This modification also allows for multiple partitions of the same predictor variable from the same parent 
region. 

Maintaining the parent region in a modified recursive partitioning algorithm results in a class of models with 
greater flexibility than permitted in recursive partitioning. However, the modified approach is still burdened with 
the discontinuities caused by the step function H[rj\. To alleviate this difficulty, Friedman proposes to replace 
the step function H[rj\ in the model formulation step ( lOg) with q = 1 order (i.e. linear) regression splines in the 
form of left (-) and right (+) truncated splines. Let r m represent a 2-tuple associated with the R m th subregion 
whose components identify the direction (left or right), specific predictor variable, and partition point used to 
create subregion Rm from its parent region. A left and right truncated spline for creating the /? m th and R m +\st 
subregion from the parent region Rj with a partition point at x v — t is defined as 

J j',r.(*) = [(!-a:v)+r 1 =(l-i»)+ and 7j,r m + . ( x ) = [(*<> “ 0+] ,= 1 = ( r v - <)+. ( 17 ) 

where r m = (— v,J) and r m + i = (+t>,<) and m > j. The additional subscripts j and m, or j and m+ 1, provide 
a necessary audit trail for products of truncated splines when interactions are allowed among multiple predictor 
variables. Note that the truncated spline functions act tn only one dimension although their argument is a vector 
of predictor variables. 
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A modeling approach using linear truncated splines (17) creates a continuous approximating function f(x) 
with discontinuities in the first partial derivative of f(x) at the partition points of each predictor variable in 
the model. The argument for using linear truncated splines (17) is that there is little to be gained in flexibility, 
and much to lose in computational speed by imposing continuity beyond the function f(x). Linear truncated 
splines allow rapid updating of the regression model and its coefficients during each exhaustive search for the 
next partition of an established subregion. The placement of additional partitions may be used to compensate 
for the loss of flexibility in using linear truncated splines to estimate f(x) over a subregion of the domain. 

Implementation of the modifications proposed above to the recursive partitioning algorithm avoids its identified 
difficulties and results in the MARS algorithm. The MARS algorithm produces a linear ( q = 1) truncated spline 
model (16) with overlapping subregions {Rj}^ of the domain D. Each overlapping subregion of a MARS model 
is defined by the partition points of the predictor variables from an ordered sequence of linear truncated splines. 

Define the product basis function K m (x) as the ordered sequence of truncated splines associated with sub- 
region R m . The first term of every product basis function is To >ri (x) = 1, the initialization function associated 
with R\. Each additional truncated spline represents the iterative partitioning of a parent region into a sibling 
subregion. For example, assume the sequence of ordered truncated splines for the parent region R7 is (1,3,7), 
which is split using T ri Tiii (x) to create subregion R m . The product basis function K m (x) associated with the 
Rm th subregion for this example is 



Km(&) — To ri (x) x ^V lt r 3 (*) x Tr 3t 

" V- 



(18) 



where m > 7. 

To evaluate K m (x) at x requires the evaluation of each truncated spline in the product basis function at x. If 
any of the truncated spline evaluations at x are zero, then K m (x) at x is 0. Otherwise, the evaluation of K m (x) 
at x is the product of the truncated splines at x. For example, let the ordered truncated splines for R$ £ 5ft 3 be 
(1,2 and 5) with r 2 = (2,3) and r 5 = (—3, 1). The product basis function associated with R$ is 



A" 5 (a;)— To >ri (x) x T riy r 2 {x) x Tr 2t rA x ) 
- lx(r 2 “ 3)+ x (1 - x 3 )+ 



( (z 2 -2)(1-z 3 ) 

l 0 



if x 2 > 2 and x 3 < 1 
otherwise. 



If x\ = {5,4,0} £ R*> and x 2 = {4,3,6} 3 #5, then K^{x\) ~ 2 and K$(x 2 ) = 0. 

The level of interaction of the predictor variables associated with Rj is the number of truncated splines 
(without To^xix)) in a product basis function Kj (x). A one term product basis function represents a truncated 
linear relationship of its predictor variable while a two term product basis function represents a truncated 2-way 
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interaction and so on. The number and level of interactions in a MARS model are only limited by the data and 
the level of interactions permitted by the MARS algorithm. 

The MARS estimate of the unknown function f(x) is 

5 

/(*) = ^c ; ^(x), (19) 

j = l 

where f(x) is an additive function of the product basis functions {Aj(x)}j_j associated with the subregions 
As in recursive partitioning the objective of the forward step MARS algorithm is to iteratively adjust 
the vector of coefficient values to best fit the data while identifying the subregions {Rj}jL for M > 5, whose 
product basis functions approximate f(x) based on data at hand. And again, as in the recursive partitioning 
procedure, it makes sense to follow the forward step procedure with a backward step trimming procedure to 
remove the excess (M — 5) subregions whose product basis functions no longer sufficiently contribute to the 
accuracy of the fit. 

5 FORWARD STEP MARS ALGORITHM 

The MARS forward step algorithm results from applying the modifications addressed in Section 4 to the 
forward step recursive partitioning algorithm (10). Again we initialize Ri = D. However, in MARS we create 
two new subregions R m and R m +\ and maintain the parent region Rj* during each partition. Also, MARS 
restricts each sequence of truncated splines from having more than one partition per predictor variable because 
this creates a nonlinear spline function i.e., one with q > 1. MARS enforces this restriction, during the search 
for the next best partition of a subregion Rj , by excluding from consideration for a partition point any predictor 
variable already included in the product basis function Kj(x). The most notable difference between the RP 
and MARS algorithms occurs in forming the MARS model. Again following Friedman (1988), the product basis 
functions {Kj(x)}J l =l given at (18) and the truncated splines T rj> r m ( x ) and T rji r m+1 (^) given at (17) replace 
the basis functions {^(x)}™.! and the step functions H[t — x„] and H[x v — t ] in the forward step recursive 
partitioning algorithm (lOg) respectively. 
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MARS Forward Step Algorithm 



( 20 ) 



R\ = D } To^^x) — 1 (a) 

For each subregion R mj m = 2 to M do: (b) 

bof* = oo, j* = 0, v* = 0, r = 0 (c) 

For each established subregion Rj , j — 1 to m — 1 do: (d) 

For each predictor variable x v in Rj, v = 1 to p such that v 3 Kj(x) do: (e) 

For each data value x V} k in Rj, t = x Vt k-\ to x v ^ =n do: (f) 

9 = (T,d c dl<d(x)) + c m Kj(x)T rj ,r m \x) + c m+l K j (x)Tr j< r m+l ( x ) (g) 

bof= BOF m (h) 

if bof < bof* then bof* = bof ; j * = j; v* — v\ t* = t end if (i) 

end for 
end for 
end for 

Rm ^Rj-H[(r-x v .)) (j) 

Rm+i *-R j *H[(x v .-f)] (k) 

m ♦— m *f 2 (1) 

end for 
end 



To characterize the MARS procedure we use the example discussed in section 2 with p = 3 predictor variables, 
and M = 5, the maximum number of forward step partitions. The MARS algorithm parallels the recursive 
partitioning algorithm except for the modifications discussed in section 4. At the start of the MARS forward 
step algorithm for our example problem, the initial subregion is again the entire domain i.e., R\ = D. Thus, the 
single subregion MARS estimate of f(x) is identical to the recursive partitioning estimate, 

1 N 

f(x) = Cil<i(x) = ciT 0 ,r,(*) = Cl = (21) 

Again, let the exhaustive search in the first iteration of MARS identify the best partition of R\ as t* = £ 2 , 25 - 
Continuing, the three subregion MARS estimate of f(x) obtained at the second step (first partition at t* = £ 2 , 25 ) 
is, with To t n (*) = 1, 

/(*) = d K i(x) + c 2 K 2 (x) + c 3 K 3 (x) (22) 

= Ci To } ri (x) -b C2 To.ri (®) Tr u r 7 (®) + c 3 To tVl (x) Tr lt r 3 (*) 

= Cl + C2 ( t * — £2)+ + C 3 (£2 — F)+, 



{ R\ if x 6 D 

R 2 if £2 < £ 2,25 an d x 6 R\ 

R 3 if £2 > £ 2) 25 and x 6 R\. 

In the next iteration of the forward step MARS algorithm the best partition point will occur within the 
subregions R\,R 2 or R$ and as in recursive partitioning, with one exception, will be chosen after evaluation of 
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all potential partition points for each predictor variable within the three subregions. The exception, as discussed 
previously, prevents another partition on x 2 in R 2 or R 3 because it would create a nonlinear truncated spline 
function. With A/ = 5 the forward step of the MARS algorithm will be complete after a second partition in D. 
The final forward step MARS estimate of f(x) for our example will include all terms in (22) and the additional two 
terms generated by the second partition. The model will have 5 single term product spline functions (excluding 
To,ri(*)) if the second partition occurs in R\ while the model will have 3 single term product spline functions 
and two 2-way product spline functions if the second partition occurs in R 2 or R$. 

After the backward trimming procedure, the final MARS model retains the form of (19) with c x the coeffi- 
cient of the product basis function K\(x) and the remaining terms the coefficients and product basis functions 
that survive the MARS backward step subregion deletion strategy. To provide an insight of predictor variable 
relationships we can rearrange the final MARS estimate of f(x) in an ANOVA style decomposition, 



/(*) = c, + c i K j( x ) + 53 c i K i( x ) + ■■■ ( 23 ) 

V - 1 V=2 

where V indexes the number of truncated splines (excluding To^^x)) in the product basis function {K 
This method identifies any and all contributions to f(x) by variables of interest. Product basis functions with the 
index V = 1 reflect truncated linear trends and those with the index V = 2 reflect truncated 2- way interactions, 
etc. The ANOVA style decomposition (23) identifies which variables enter the model, whether they are purely 
additive, or are involved in interactions with other variables. Analysis of the ANOVA style decomposition 
facilitates interpretation of the MARS model. 

MARS uses residual-squared-error in the forward and backward steps of the algorithm to evaluate model 
fit and compare partition points because of its attractive computational properties The actual backward fit 
criterion is a modified form of generalized cross validation ( GCV ) first proposed by Craven and Wahba (1979). 
The GCV criterion of a MARS model with the subregions {Rj}j L x is, 



GCV(M) = " 1 . 

[1 - ^] 2 



(24) 



where C(M) is a complexity cost function, increasing in A/, which accounts for the increasing model complexity 
due to the sequential partition of D into the subregions {Rj}j= l . The numerator of the GCV criteria is the 
average residual-squared-error and the denominator is a penalty term that reflects model complexity. 



6 NONLINEAR MODELING OF TIME SERIES USING MARS 

Most research in and applications of time series modeling and analysis are concerned with linear models. 
However, nonlinear time dependent systems abound that are not adequately handled by linear models. For 
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these systems we need to consider general classes of nonlinear models that readily adapt to the precise form of 
a nonlinear system of interest (Priestley, 1988). By letting the predictor variables for the rth value in a time 
series {x r } be x r _ 1 ,x T _ 2 , . . . , x r _ p , and combining these predictor variables into a linear additive function, one 
gets the well known linear AR(p) time series models. What happens if we use the MARS methodology to model 
the effect on x r by x T _i, x r _ 2 , . . . , x r _ p ? The answer is that we still obtain autoregressive models of the effect 
on x T by x T _i, x r _ 2 > • • • > x r-p, however, these models can have nonlinear terms from lagged predictor variable 
thresholds and interactions. We now pursue the form and analysis of these nonlinear models. 

Threshold models (models with partition points) are a class of nonlinear models that emerge naturally as 
a result of changing physical behavior. Within the domain of the predictor variables, different model forms 
are necessary to capture changes to the relationship between the predictor and response variables. Tong (1983) 
provides one threshold modeling methodology for this behavior (TAR - Threshold Autoregression) that identifies 
piecewise linear pieces of nonlinear functions over disjoint subregions of the domain D i.e., identify linear models 
within each disjoint subregion of the domain. One application of Tong’s threshold modeling methodology is for 
nonlinear systems thought to possess periodic behavior in the form of stationary sustained oscillations (limit 
cycles). Tong’s threshold methodology has tremendous power and flexibility for modeling of many times series. 
However, unless Tong’s methodology is constrained to be continuous, it creates disjoint subregion models that 
are discontinuous at subregion boundaries. 

With MARS, by letting the predictor variables be lagged values of a time series, one admits a more general 
class of continuous nonlinear threshold models than permitted by Tong’s TAR approach. We call the methodology 
for developing this class of nonlinear threshold models ASTAR (Adaptive Spline Threshold Autoregression). The 
fact that one obtains a more general class of continuous nonlinear threshold models can be shown using a simple 
example. Let X T for t = 1, ... , A r , be a time series we wish to model with ASTAR using, for example, p = 3 
lagged predictor variables namely, X T _i, X r _2 and X T -%. Each forward step of the ASTAR algorithm selects 
one and only one set of new terms for the ASTAR model from the candidates specified by previously selected 
terms of the model. The sets of candidates for the initial forward step of the ASTAR algorithm for our example 
problem is 



(X T _i - <*)+ and ( t' -X T _i) + , or 
(X t -2 - <*)+ and (<* - X r _ 2 )+, or 

(X t _3-<*) + and(f -X t _ 3 ) + , (25) 

for some partition point (threshold) t* in the individual domain of the lagged predictor variables. For our 
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example problem, assume that ASTAR selects the lagged predictor variable X T _2 with threshold value t m = t\ 
i.e., (X t -2 — *i)+ and (*i — X r _ 2 )+ are the initial terms (other than the constant) in the ASTAR model. The 
sets of candidates for the second forward step of the ASTAR algorithm includes all candidates in (25) and the 
new sets of candidates: 

r) + (* T _ 2 -t,)+ and (<* -X T .,) + (X T _ 2 -<,)+, or 
(AV -3 — t )+(At _2 — fi)+ and (t* — X T _3)+(X r _2 — *i)+» or 
{Xr-x - n+(<! - X T - 2 )+ and (r - X T -i)+(t x - X r _ 2 ) + , or 

(Xt — 3 - n+(ti - Xr- 2 )+ and (r - X T - 3 )+(<i - X r _ 2 )+, (26) 

due to the initial selection of (X T _2 — fi)+ and (fi — A' T _ 2 )+. The sets of candidates for each subsequent forward 
step of the ASTAR algorithm is non decreasing in size and is based on previously selected terms of the model. 
As discussed in Section 4, the forward step algorithm is followed by a backward step algorithm that trims the 
excess (M — S) terms from the model. 

By modeling univariate time series using ASTAR we overcome the limitations of Tong’s approach. The 
ASTAR methodology creates threshold models that are naturally continuous in the domain of the predictor 
variables, allow interactions among lagged predictor variables and can have multiple lagged predictor variable 
thresholds. In contrast, Tong’s methodology creates threshold models from piecewise linear models whose terms 
are restricted to the initial sets of candidates of the ASTAR algorithm ((25) for our example). Tong’s threshold 
models do not allow interactions among lagged predictor variables and are usually limited to a single threshold 
due to the difficulties associated with the threshold selection process. 

We next examine the ability of ASTAR to identify and model linear and nonlinear times series models. The 
simulation of an AR(1) model with known coefficients examines the ability of ASTAR to detect and model a 
simple linear time series. The simulation of a threshold model with ‘AR(l)-like’ models in each disjoint subregion 
examines the ability of ASTAR to detect and model simple nonlinear threshold time series. Finally, in Section 6.3 
we examine the ability of ASTAR to model the widely studied Wolf sunspot numbers, a nonlinear time series 
with periodic behavior. 

6.1 AR(1) Simulations 

We first consider simulation of an AR(1) model, 

X r =pX T - X +/v +e r (27) 
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where r = 1,2,..., N indexes the time series, p is a constant coefficient varied within experiments, K = 0 
is the model constant and e r is /V(0,of). The model is usually considered under the stationarity conditions 
(| p |< 1), but random walks (| p | = 1) and explosive processes (| p \ > 1), are also of interest. Two categories 
of experiments were conducted using the AR(1) model. The first experiment required ASTAR to estimate a 
model from the simulated data of the AR(1) model using one lag predictor variable AV_i, and using M = 3, 
the maximum number of subregions in the forward step ASTAR procedure. The first experiment’s alternative 
models either have no X T ~\ term (a constant model) or have a X T ~\ term with a threshold value t greater than 
minfAV-i}^! 1 . In this case we call the threshold value t an internal threshold. The second experiment required 
ASTAR to estimate a model from the simulated data of the AR(1) model using four lag predictor variables, 
{ x , and using M = 8, the maximum number of subregions allowed in the forward step ASTAR procedure. 
The second experiment’s alternative models include constant models, models with an internal threshold value, 
and any model that includes a term other than X r ~\. The interest in these simulations is two- fold: how often 
was the true model identified, and if so, how well were the parameters K and p estimated. 

Several simulation results are shown in Figures 2-7 for p = .5, .7 and .9, K = 0, and e T = N(0,1). Each figure 
is a series of box plots for the coefficients of the 100 AR(1) simulated models correctly identified by ASTAR 
for increasing values of N . The true value of each model coefficient is identified by the dashed line across the 
box plots. At the top of each figure is the length N of each simulated time series, the number C of the 100 
simulated models correctly identified by the ASTAR procedure, and the equivalent sample size for independent 
data, Eq S SIZE = {N / YlTL-oo P % ) (Priestley, 1981). Underneath each box plot is summary information for the 
coefficient estimates of the correctly identified AR(1) models i.e., the sample mean and sample standard deviation 
of the values in the box plots. By comparing the true and the estimated values of the model coefficients across 
increasing values of N it is observed that the estimated values of the coefficients tend to the true value as N 
increases. Also, in all but one simulation the number of correctly identified models (100-C) rises to 100 for 
increasing values of N . Note that the ASTAR estimates for p have negative bias for small values of N that 
generally decreases as N increases. The downward bias of p is similar to that identified by Kendall et al. (1983) 
and others when using data for estimating autocorrelations. 

6.2 Threshold Simulations 

To observe the ability of ASTAR to capture nonlinear threshold model characteristics we consider simulation 
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of the 2-subregion threshold model 






P\X T -i + ^ r 

P2 — 1 + 



if Y r -i < 0 
if X T ^i > 0 



(28) 



where r = 1,2, . . . , TV indexes the time series, pi and p 2 are constant coefficients varied for different experiments 
and € is 7V(0, <j \ ). Note that the threshold model (28) has an ‘AR(l)-like’ model in each subregion. Two categories 
of experiments were conducted using the threshold model. The first experiment required ASTAR to estimate a 
model from the simulated data of the threshold model using one lag predictor variable X T -\, and using M = 3, 
the maximum number of subregions in the forward step ASTAR procedure. The first experiment’s alternative 
models are either linear or have more than one internal threshold. The second experiment required ASTAR to 
estimate a model from the simulated data of the threshold model using four lag predictor variables, {AT r _,}* =1 , 
and using M = 10, the maximum number of subregions allowed in the forward step ASTAR procedure. The 
second experiment’s alternative models have more than one internal threshold term, include terms other than 
AV_i, or are linear. 

Several simulation results are shown in Figures 8-11 for p\,pi = .7, .3 and —.6, .6, and e r = N(0,.25). As 
with the previous AR(1) model simulation experiments, each figure is a series of box plots for the coefficients of 
the 100 threshold simulated models correctly identified by ASTAR for increasing values of N. The true value 
of each model coefficient is identified by the dashed line across the box plots. At the top of each figure is the 
length N of each simulated time series, and the number C of the 100 simulated models correctly identified by 
the ASTAR procedure. Underneath each box plot is summary information for the coefficient estimates of the 
correctly identified threshold models i.e., the sample mean and sample standard deviation of the values in the 
boxplots. Note that the number of correctly identified models rises for increasing values of N. However, a 
consistent improvement in the mean and standard deviation for the estimated values of the model coefficients is 
not always observed for increasing values of N . For the most part this is attributed to the increasing number of 
correctly identified models for increasing values of N. 



6.3 Threshold Modeling of the Wolf’s Sunspot Numbers 



As an illustration of ASTAR ability to model an actual time series we examined 221 (1700-1920) of the Wolf 
sunspot numbers. The Wolf sunspot numbers are relative measures of the average monthly sunspot activity on the 
surface of the sun (see, e.g., Scientific American, February 1990). Some of the early analysis and modeling of the 
sunspot numbers was performed by Yule (1927) as an example for introducing autoregressive models. Recently 
suggested nonlinear models of the sunspot numbers include threshold models (Tong, 1983) and bilinear models 
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(Rao and Gabr, 1984). A detailed review of the history of the sunspot numbers is provided by Izenman (1983). 

The data (Figure 12) is quite periodic’ but has nonsymmetric cycles with extremely sharp peaks and troughs. 
The cycles (Table 1) generally vary between 10 and 12 years with the greater number of sunspots concentrated 
in each descent period versus the accompanying ascent period. The average ascent period is 4.60 years and the 
average descent period is 6.58 years. Attempts to model the data with a fixed cycle period signal plus (possibly 
correlated) noise have failed because the cyclic component in the spectrum (Figure 13, top) is quite spread out 
and diffuse. 



Ascent period 5 
Descent period 




566333 66 

65566 11 67 



Ascent (cont) 745435444 

Descent (cont) 36878688 



Table 1: Ascent and Descent periods of the Sunspot Data (1700-1920). 



One of the interesting characteristics of Tong’s analysis of the sunspot numbers included the development of 
threshold models with stationary harmonic behavior or limit cycles. Using Tong (1983), let r = 1,2,... index 
a times series and let x k = {x r ,x r _i, .. . ,x r _*+i} denote a ^-dimensional vector in D 6 that satisfies the 
equation, 

= /(*r — 1)> (29) 



where / is a vector-valued function. Let ft (x) denote the j th iterate of /, 



i.e. 



We say that a fc-dimensional vector * 



/'(*) = /(/(/(...(/(*))...»)• (30) 

' V 

j of them 

* k is a stable limit point of the function / with respect to the domain D if 



f*(x o) — ► x* k as j — ► oo V xq 6 D. 



(31) 



Also, we say that a k - dimensional vector c k is a stable periodic limit point with period T > 1 of the function / 
with respect to the domain D if 

f jT (x 0 ) — ► c\ as j — ► oo V aj 0 € D, (32) 

and the convergence does not hold for any divisor of T. It follows that c\ , /^c*), / 2 (cf), . . . , f T ~ l (c k ) are 
simultaneously distinct stable periodic points of the function / with respect to D. If we let f l (c k ) be denoted 



20 



by c* +1 , i = 0, 1 , . . . , T — 1, then the set . . . , c^_ 1 } is called a stable limit cycle of the function / with 

respect to D. 

Our primary interest in limit cycles is for investigating the underlying characteristics of the true function 
/(x) given at (1). If we believe that the cyclical behavior of /(x) can be modeled as a limit cycle perturbed by 
Gaussian white noise, as we do with the sunspot numbers, then when applying ASTAR to the sunspot numbers 
it would be satisfying to identify an underlying limit cycle in our estimate of /(x). With this objective in mind 
we investigated 20 ASTAR models of the sunspot numbers. The different models were identified by varying the 
user parameters of the ASTAR algorithm to include; level of interaction, number and separation of partition 
points, number of forward step subregions, and availability of lagged predictor variables. The maximum order 
of the model (number of lagged predictor variables) was restricted to 20 and the first 20 sunspots (1700-1719) 
were used for model initialization. 

Table 2 provides a summary of the 20 ASTAR models for the sunspot numbers (1720-1920), ordered by the 
generalized cross validation (GCV) criterion (24). The first three columns identify the model number, the GCV 
criterion and the mean sum of squares (MSS) of the fitted residuals for each ASTAR model. The fourth through 
sixth columns identify the number of estimated parameters, the number of partition points and the maximum 
level of interaction in each model. Columns seven and eight identify the length (in years) of each model’s limit 
cycle (if one exists) and the number and lengths (in years) of the one or more type ‘subcycles’ (ascent and descent 
periods) within the limit cycle. We use MSS instead of MSS 1 / 2 to facilitate comparison of the ASTAR models 
with other modeling efforts of the Sunspot numbers. 
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GCV 


MSS 


Number of 
Model 
Parameters 


Number of 
Interior 
Thresholds 


Level of 
Model 
Interaction 


Length of 
Limit Cycle 
(in years) 


Number 
(Lengths) of 
Subcycles 


1 


141.6 


91.6 


16 


4 


4 


225 


27 


(8,9) 


2 


159.9 


111.7 


14 


4 


3 


9 


1 


( 9 ) 


3 


160.5 


91.4 


25 


9 


3 


— 






4 


164.8 


95.3 


18 


4 


5 


— 






5 


165.9 


113.0 


19 


7 


2 


9 


1 


( 9 ) 


6 


166.2 


115.9 


13 


3 


4 


120 


11 


(10,11) 


7 


167.2 


114.2 


14 


3 


4 


— 






8 


173.9 


119.5 


14 


3 


3 


— 






9 


174.1 


114.2 


14 


6 


3 


137 


13 


(10,11) 


10 


176.8 


125.6 


11 


2 


2 


78 


7 


(11,12) 


11 


180.1 


101.0 


15 


6 


4 


43 


4 


(10,11) 


12 


180.3 


115.9 


13 


3 


3 


— 






13 


184.1 


119.8 


11 


2 


3 


133 


12 


(11,12) 


14 


187.9 


103.6 


18 


3 


4 


— 






15 


190.2 


126.2 


13 


1 


3 


— 






16 


191.4 


110.5 


17 


3 


3 


167 


15 


(11,12) 


17 


193.5 


116.0 


13 


2 


4 


94 


10 


(9,10) 


18 


195.3 


117.6 


15 


2 


4 


— 






19 


195.5 


114.2 


17 


3 


3 


120 


11 


(10,11) 


20 


211.1 


119.6 


18 


3 


3 


23 


4 


(5,6) 



Table 2: ASTAR models of the Wolf Sunspot Numbers (1720-1920). 



Some form of a limit cycle exists in 12 of the 20 ASTAR models. Also, 5 of the 12 models, namely 9,10,11,13 
and 16, provide limit cycles with lengths 137,78,43,133 and 167 respectively, and ‘subcycles’ with lengths and 
range similar enough to the behavior of the sunspot data (Table 1) to warrant further analysis. Of these 5 
models, 2 (9 and 11) provide fitted residuals that appear independent and Gaussian. Some of the statistics for 
the fitted residuals of these two models are provided in Table 3. 





Model 9 


Model 11 




Mean 


0.000 


0.000 




MSS 


114.2 


101.0 




Skewness 


0.0813 


.346 


0 for normal distribution 


Kurtosis 


0.673 


0.153 


0 for normal distribution 


K-S 


.275 


.349 


level of significance 


C-M 


> .15 


> .15 


level of significance 


A-D 


> .15 


> .15 


level of significance 


L-M 


.6892 


.0466 


level of significance 



Table 3: Statistics for the Fitted Residuals of ASTAR Models 9 and 11 of the Wolf sunspot numbers 
(1720-1920). 



The Skewness and Kurtosis statistics serve as a general indicator of the symmetry and heaviness of tin* tails 
for the sample distribution function of the fitted residuals Fe(x). The Kolmogorov-Smirnov (K-S) test statistic 
measures the maximum absolute distance between F e (x) and the hypothesized true normal N(0,1) distribution 
function F x (x) while the Cramer- von Mises (C-M) statistic measures the integral of the squared distance between 
the two functions. A drawback to the K-S and C-M tests are that they lack sensitivity to departures from the 
null hypothesis that occur in the tails of a distribution. As an approach to overcome the lack of sensitivity of the 
K-S and C-M tests, the Anderson-Darling (A-D) test statistic weights the distances between the two functions 
A final test for independent and Gaussian error structure is provided by the Lin-Mudhoekar (L-M) test statistic 
which tests for asymmetry. We rejected Model 11 due to the low level of significance of the L-M test statistic 
and identified Model 9 as the best model (with limit cycle) of the 20 models considered in the initial analysis. 
Model 9 is 



2.710606 + .959891X r _x + .331893(47.0 - X T _ 5 )+ - .257034(59.1 - X r _ 9 ) + 



X T = 



-.002707X r _ 1 (X 



2 - 26.0)+ + .016674X r _i(44.0 - X r _ 3 )+ - .031516X r _,(17.1 - X r _ 4 )+ 



(33) 



+ .004166X^(26.0 - X T _ 2 )+(X r -. 5 - 41.0)+ 

where (x)+ is a plus function with value x if x > 0 and 0 otherwise. Model 9 has 8 terms (a constant term with 3 
one-way, 3 two-way and 1 three-way interactions) and 6 threshold values (1 each on X r _ 2 , X r _ 3 , A' T _ 4 , and X T _ 9 
and 2 on X T -$). 

Figures 12-17 are various plots of the fitted values and residuals of ASTAR Model 9. Figure 12 shows the 
fitted values of the model versus the Wolf sunspot numbers (1720-1920). The model appears to equally overfit 
and underfit the peaks and troughs as it captures the general structure of the sunspot numbers. The model fit 
is further examined using the estimated normalized periodogram (Figure 13) and autocorrelation function plots 
(Figure 14). The fitted residuals of the model are examined using residual versus time and fit plots (Figure 15) 
and the residual autocorrelation function plot (Figure 16). In Figure 15 the slight lack of negative residuals 
for small fitted values of the model is attributed to the sunspot numbers being positive random variables 
Figure 17 shows the 137 year limit cycle of Model 9 with its ascent and descent periods. The limit cycle is 
asymmetric with a range in amplitude of 17.7 to 94.5 and an average ascent/descent period of 4.3/6.23 years 
versus 4.6/6.58 years for the actual sunspot numbers. In comparing Model 9’s limit cycle (Figure 17) with the 
real sunspot data (Figure 12) note that the standard deviation of the fitted residual’s error variance is estimated 
as (MSS) 1 / 2 = 10.69 sunspots. 

To investigate the predictive performance of ASTAR Model 9, developed using the Sunspot numbers from 
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1700-1920, we compared it’s forward step predictions with the Full Autoregressive, Threshold (Tong, 1983) and 
Bilinear subset (Rao,1984) models for the Sunspot numbers from 1921-1955. The mean sum of squares for the 
errors of the predictions obtained by these four models are given in Table 4. 



Model 


AR 


Threshold 

(Tong) 


Bilinear 

Subset 

(Rao) 


ASTAR 
Model 9 




199.27 


153.71 


124.33 


114.2 


Number of 
Parameters 


10 


19 


11 


14 




190.9 


148.2 


123.8 


136.4 


*?(2) 


414.8 


383.9 


337.5 


324.1 


*?(3) 


652.2 


675.6 


569.8 


481.0 


*?(4) 


725.8 


716.1 


659.0 


427.3 


*?(5) 


771.0 


756.4 


718.9 


378.0 


<3-2(6) 


— 


— 


— 


420.8 


*?(7) 


— 


— 


— 


454.2 


*?(8) 


— 


— 


— 


468.6 



Table 4: The mean sum of squares error <5^, number of model parameters and the predictive mean sum 
of squares error crl(i) for the i th forward step prediction for the period (1921-1955) of the AR, Threshold, 
Bilinear and ASTAR models of the Sunspot Numbers for the period (1700-1920). 

The performance of the ASTAR model for forecasting the Sunspot numbers from 1921-1955 is a considerable 
improvement over the AR and Threshold models for every forward step and is an improvement over the Bilinear 
subset model for every forward step with the exception of the first step. Also, it is interesting and very surprising 
to note that the predictive mean sum of squares error for the ASTAR model decreases in the fourth and fifth step 
before increasing again. This phenomenon was also identified in subsequent analysis of other ASTAR models 
with limit cycles. We attribute this interesting phenomenon to the underlying limit cycle of the models. 

7 CONCLUSIONS 

MARS is a new methodology for nonparametric modeling that utilizes regression spline modeling and a 
modified recursive partitioning strategy to exploit the localized low dimensional behavior of the data used 
to construct f(x). Although MARS is a computationally intensive regression methodology, it can produce 
continuous models for high dimensional data that can have multiple partitions and predictor variable interactions. 
The final MARS model may be analyzed using an ANOVA style decomposition. Also, although the main 
advantages of MARS modeling are in high dimensional settings p > 2 it has been shown to be highly competitive 
with other regression methods in low dimensional settings (Friedman, 1988). 
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In this paper, by letting the predictor variables in MARS be lagged values of a time series, we obtain ASTAR 
(Adaptive Spline Threshold Autoregression), a new method for nonlinear threshold modeling of time series. We 
show this by applying ASTAR to simple autoregressive and nonlinear threshold models. A significant feature 
of ASTAR when modeling time series data with periodic behavior is its ability to produce continuous models 
with underlying sustained oscillations (limit cycles). Time series that possess such behavior include the Wolf 
Sunspots, Canadian Lynx and various river flow data sets to name a few. Our initial analysis of the Wolf sunspot 
numbers (1700-1920) using ASTAR produced several models with underlying limit cycles. When used to predict 
the Sunspot numbers (1921-1955), the ASTAR models are a significant improvement over existing Threshold 
and Bilinear models. 
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pX <T _|) + K + c T for 



r - 1.2, ... M 



N 
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250 
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96 


100 


£q S SIZE 


33 


03 



500 


750 


100 


100 


166 


250 



0.7 
0.6 
0.5 
0.4 

0.3 
Mean 
Sid Dev 



g- 



0.46172 

0.00167 




0.48456 

0.05047 




0.49731 0.49669 

0.03561 0.03265 



K 

0.04 * 

0.02 ■ 

0 -■ 

- 0.02 • 

-0.04 L 

Mean -0.0020 0.00037 0.00041 0.00004 

Sid Dev 0.01527 0.00632 0.00286 0.00176 




Figure 2: AR(1) MODEL SIMULATION: ASTAR estimates for p = .5, K = 0 and <r\ = A(0, 1) from C simulations of 
an AR(1) model for increasing values of N , with P = 1 lag predictor variables, and M = 3, the number of forward step 
subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots are for the 
estimates of the model parameters when ASTAR correctly identified the AR(1) model. For N = 100, 2 simulations were 
incorrectly identified as constant models. 



X| T ^ * pX| ra ,|j + K + c T for t » 1,2, ... ,N 
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Figure 3: AR(1) MODEL SIMULATION: ASTAR estimates for p = .5,K = 0 and <r\ = JV(0, 1) from C simulations 
of an AR(1) model for increasing values of N with P = 4 lag predictor variables, and M = 8 , the number of forward 
step subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots are for 
the estimates of the model parameters when ASTAR correctly identified the AR(1) model. For N = 100, 5 simulations 
were incorrectly identified as; 2 constant models, 1 AR(2) model and 2 AR(3) models. For N = 500, 2 simulations were 
incorrectly identified as constant models. 
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X (t) “ P X (t-I) + K -»■ C. 



for r - 1.2. ... M 



N 100 
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Std Dev 0.01319 



0.00537 



0.00262 



0.00160 



Figure 4: AR(1) MODEL SIMULATION: ASTAR estimates for p = .7, K = 0 and = ^(0,1) from C simulations 
of an AR(1) model for increasing values of A, with P = 1 lag predictor variables, and M = 3, the number of forward 
step subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots are for 
the estimates of the model parameters when ASTAR correctly identified the AR(1) model. Here all cases were correctly 
identified. 
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-0.0004 

0.00462 
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0.00333 



Figure 5: AR(1) MODEL SIMULATION: ASTAR estimates for p = .7, K = 0 and a\ = N(0, 1) from C simulations of 
an AR(1) model for increasing values of N with P — 4 lag predictor variables, and M = 8, the number of forward step 
subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots are for the 
estimates of the model parameters when ASTAR correctly identified the AR(1) model. For N = 100, 6 simulations were 
incorrectly identified as; 2 AR(2) models, 2 AR(3) model and 2 AR(4 ) models. 
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Figure 6: AR(1) MODEL SIMULATION: ASTAR estimates for p = .9,A^ = 0 and = N( 0,1) from C simulations 
of an AR(l) model for increasing values of A, with P = 1 lag predictor variables, and M = 3, the number of forward 
step subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots are for 
the estimates of the model parameters when ASTAR correctly identified the AR(1) model. Here all cases were correctly 
identified. 
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Figure 7: AR(1) MODEL SIMULATION: ASTAR estimates for p = .9, K = 0 and o\ = N( 0, 1) from C simulations of 
an AR(1) model for increasing values of of N with P — 4 lag predictor variables, and M = 8, the number of forward step 
subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots are for the 
estimates of the model parameters when ASTAR correctly identified the AR(1) model. For N = 100, 3 simulations were 
incorrectly identified as; 2 AR(2) models and 1 AR(3) model . 
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Figure 8: THRESHOLD MODEL SIMULATION: ASTAR estimates for = 7, .3 and <j\ = A(0,.25) from C sim- 
ulations of a threshold model for increasing values of N , with P — 1 lag predictor variables, and M = 3, the number of 
forward step subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots 
are for the estimates of the model parameters when ASTAR correctly identified the threshold model. The models of the 
simulations that ASTAR did not correctly identify as the threshold model (28) contained an incorrect number of subregions 
or lacked an AR(1) term in one of the two subregions. 
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Figure 9: THRESHOLD MODEL SIMULATION: ASTAR estimates for p\,p 2 = 7, .3 and = N(0, .25) from C simu- 
lations of a threshold model for increasing values of N> with P = 4 lag predictor variables, and M = 10, the number of 
forward step subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots 
are for the estimates of the model parameters when ASTAR correctly identified the threshold model. The models of the 
simulations that ASTAR did not correctly identify as the threshold model (28) contained an incorrect number of subregions , 
lacked an AR(1) term in one of the two subregions or contained terms with AV— 2» A r - 3 , or X r -\- 
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Figure 10: THRESHOLD MODEL SIMULATION: ASTAR estimates for pi,P 2 = —.6, .6 and cr\ = jV(0,.25) from 
C simulations of a threshold model for increasing values of N , with P — 1 lag predictor variables, and M = 3, the number 
of forward step subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The boxplots 
are for the estimates of the model parameters when ASTAR correctly identified the threshold model The models of the 
simulations that ASTAR did not correctly identify as the threshold model (28) contained an incorrect number of subregions 
or lacked an AR(l) term in one of the two subregions . 
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Figure 11: THRESHOLD MODEL SIMULATION: ASTAR estimates for pi,p 2 = —.6,. 6 and <7< = A(0,.25) from 
C simulations of a threshold model for increasing values of A, with P = 4 lag predictor variables, and M = 10, the 
number of forward step subregions permitted in the ASTAR algorithm. Each simulation consists of 100 replications. The 
boxplots are for the estimates of the model parameters when ASTAR correctly identified the threshold models The models 
of the simulations that ASTAR did not correctly identify as the threshold model (28) contained an incorrect number of 
subregions, lacked an AR(t) term in one of the two subregions or contained terms with A' r - 2 ,-V r - 3 , or X T -\- 
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— Wolf Sunspot Numbers 




Figure 12: The Wolf Sunspot Numbers ( 1700 - 1955 ) versus the fit of ASTAR Model 9 ( 1720 - 1920 ). The sunspot num- 
bers ( 1700 - 1719 ) were used for initialization. The sunspot numbers ( 1921 - 1955 ) were used to examine the prediction 
performance of ASTAR Model 9 and other models of the sunspot numbers. 



Estimated Normalized Periodogram of the Sunspot Data 1720-1920 





Figure 13: The estimated normalized periodogram of the Wolf Sunspot Numbers ( 1720 - 1920 ) [top] versus the estimated 
normalized periodogram of ASTAR Model 9 ( 1720 - 1920 ) [bottom]. The broad conclusion from the top periodogram is 
that there is a rather diffuse cycle in the data with a period of about 11 years, and a longer period of about 67 years. 
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Figure 14: The autocorrelation functions of the Wolf sunspot numbers and ASTAR Model 9 for the period 1720-1920. 
The dominant cycle of period approximately 11 years is clearly evident. 



Sunspot Data. (1720-1920) Residuals vs Time using MARS Model 9 
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Figure 15: Fitted residuals from ASTAR Model 9 of the Wolf sunspot numbers (1720-1920) versus year [top]. Fitted 
residuals versus the fitted sunspot numbers from ASTAR Model 9 of the Wolf sunspot numbers (1720-1920) [bottom]. 
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Figure 16: The autocorrelation function (first 40 lags) of the fitted residuals for ASTAR Model 9 of the Wolf sunspot 
numbers (1720-1920). There is no pattern of dependence in the residuals. 



Ascent - 4,5,4,5,4,5.4,4.4,4 f 4,4,5 
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Figure 17: The limit cycle for ASTAR Model 9 of the Wolf sunspot numbers (1720-1920). The limit cycle is 137 years 
long with the indicated ascent and descent periods. The limit cycle is generated using ASTAR Model 9 initialized with 
the sunspot numbers (1700-1719). The ‘subcycles’ have lengths of 10 or 11 years with 4 or 5 years per ascent period and 
6 or 7 years per descent period. 
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